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(57)Abstract: 

PROBLEM TO BE SOLVED: To obtain a method for 
selecting a primer base by obtaining a base 
sequence which is most suitable for the differential 
display method from an expression gene data base 
for processing, followed by selecting by the genetic 
algorithm using frequently appearing sequences as 
candidates. 

SOLUTION: This is a method for selecting a primer 
base sequence by obtaining a base sequence which 
is most suitable for the differential display method 
from an expression gene data base for processing, 
using frequently appearing base sequences as 
candidates as an order of frequency, followed by 
selecting optimal primers from the obtained primer 
candidates by the genetic algorithm, by a method 
which make the differential display method more 
convenient which is an experimental method for the 
elucidation of the structure and function of a gene 
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for obtaining new genes. An^cc 



and the molecular evolution, and for obtaining new genes. An^cpression gene data base is 
selected which was supplemented by overlapping an identical part by screening a 
homology to a genomic base sequence when the sequence has less than 1,000 bases. 
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[Claim(s)] 

[Claim 1] The selection approach of the primer base sequence characterized by to give the 
process which selects the optimal primer group characterized by use of a genetic algorithm 
from the primer candidate obtained according to the process which makes a primer 
candidate the base sequence which acquires and processes the optimal base sequence for 
the differential displaying method from (a) manifestation gene database in the selection 
approach of a primer base sequence, and appear at the order of frequency, and the (b) 
aforementioned (a) process. 

[Claim 2] It is the selection approach of the primer base sequence characterized by 
complementing by searching a homology with a genome base sequence and piling up with 
the same part when a manifestation gene database array does not fill 1000 bases with 
acquisition of said manifestation gene database base sequence in the selection approach of 
a primer base sequence according to claim 1. 

[Claim 3] It is the selection approach of the primer base sequence characterized by 
performing the high-speed homology search which optimized decision of the frequency of 
said base sequence to full coincidence of a short array in the selection approach of a 
primer base sequence according to claim 1. 

[Claim 4] In the selection equipment of a primer base sequence equipped with the 
processor which executes an instruction, the input device which inputs data, the store 
which memorizes the data inputted from this input device, and the output unit which 
outputs data (a) The input-process section which inputs the manifestation gene database 
read from said store, and a genome base sequence, (b) The new database creation 
processing section which processes data based on the data from this input-process 
section, and performs new database creation, (c) The processing section which performs 
each entry, complementary location count of a short base sequence, complementary 
location database creation, and a sort, (d) The primer candidate base sequence output 
section which outputs a primer candidate base sequence based on the data from this 
processing section, (e) The genetic algorithm processing section which performs genetic 
algorithm processing based on the data from this primer candidate base sequence output 
section, (f) Selection equipment of the primer base sequence characterized by providing 
the optimal primer candidate base sequence group output section which outputs an optimal 
primer candidate base sequence group based on the data from this genetic algorithm 
processing section. 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to the selection approach of the primer base 
sequence by the genetic algorithm which enables it to use in convenience the differential 
displaying method which is the experiment technique for the structure of a gene, a function 
or molecular evolution, and new gene acquisition, and its equipment. 
[0002] 

[Description of the Prior Art] The gene in a life is information which is known as DNA and 
by which the code was carried out to four kinds of bases on the organic chemistry matter. 
The gene base sequence on DNA is imprinted by RNA, and is translated into protein. A 
living thing leads a life, completing the body and coping with the changing external world, 
makes a descendant, and dies of the flow of the genetic information soon. A gene is that is, 
like engineering drawing of a life activity, or the guide of a life activity. 

[0003] It has been determined in large quantities [ the genome DNA sequence by which all 
genetic information is borne, and the actually discovered base sequence data of a gene ], 
and quickly by the DNA sequencing technique in recent years. 

[0004] A computer is introduced into molecular biology for management of these data, or 
analysis, and it is KOMPYUTEISHONARU. Molecular The fusion field of the information 
science and molecular biology which are called biotechnology ROJII (count molecular 
biology: computational molecular biology) or bioinfomatics (life information science: 
bioinformatics) is progressing. 

[0005] Moreover, in the field of molecular biology, in order to search for gene expression, 
various approaches are developed until now and used. It is law. as one of them — current 
— being carried out widely — PCR (polymerase chain reaction) — the differential display 
(Differential Display) which is the kind — law A primer with the base sequence considered 
to combine with more cDNA(s) at cDNA (group) created from mRNA, It is the laboratory 
procedure which performs the PCR method by the primer corresponding to the Poly-A 
array of a 3'-end, and they are coincidence and the outstanding technique for which it can 
search systematically about the manifestation about many genes. 
[0006] 

[Problem(s) to be Solved by the Invention] However, use of the database in the differential 
displaying method only presumed the class of gene from the database after the base 
sequence determination of the mainly obtained gene. Moreover, although there was a 
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program which selects the p^mer for the PCR methods from^Ringle DNA sequence until 
now, the system which chooses the primer which covers more base sequences is difficult 
to realize using a current program, and it could not but depend for it on the rule of thumb 
until now about selection of the primer used by the differential displaying method. 
Moreover, when choosing which primer, there was little duplication and there were not a 
new gene's being chosen and a means to get to know. 

[0007] In order that this invention may solve the above-mentioned trouble, may compare 
more generation products efficiently by the differential displaying method and may raise the 
probability of new gene acquisition, it aims at offering the selection approach of the optimal 
primer base sequence, and its equipment. 
[0008] 

[Means for Solving the Problem] In order to attain the above-mentioned purpose, this 
invention acquires and processes the optimal base sequence for the differential displaying 
method from a (a) manifestation gene database, and is made the process which selects the 
optimal primer group characterized by use of a genetic algorithm from the primer candidate 
obtained according to the process which makes a primer candidate the base sequence 
appearing [ many ] at the order of frequency, and the (b) aforementioned (a) process giving 
in the selection approach of a [1] primer base sequence. 

[0009] [2] In the selection approach of the primer base sequence the above-mentioned [1] 
publication, when a manifestation gene database array does not fulfill 1000 bases, 
complement acquisition of said manifestation gene database base sequence by searching a 
homology with a genome base sequence and piling up with the same part. 
[0010] [3] In the selection approach of the primer base sequence the above-mentioned [1] 
publication, decision of the frequency of said base sequence is made to perform the 
high-speed homology (similar) search optimized to full coincidence of a short array. 
[0011] [4] In the selection equipment of a primer base sequence equipped with the 
processor which executes an instruction, the input device which inputs data, the store 
which memorizes the data inputted from this input device, and the output unit which 
outputs data The input-process section which inputs the manifestation gene database read 
from said store, and a genome base sequence, The new database creation processing 
section which processes data based on the data from this input-process section, and 
performs new database creation, The processing section which performs each entry, 
complementary location count of a short base sequence, complementary location database 
creation, and a sort, The primer candidate base sequence output section which outputs a 
primer candidate base sequence based on the data from this processing section, The 
genetic algorithm processing section which performs genetic algorithm processing based on 
the data from this primer candidate base sequence output section, The optimal primer 
candidate base sequence group output section which outputs an optimal primer candidate 
base sequence group based on the data from this genetic algorithm processing section is 
provided. 

[0012] Since it constituted as mentioned above, the multiple selection of the optimal primer 

for the differential display which was difficult until now can be made. 

[0013] 

[Embodiment of the Invention] Hereafter, it explains to a detail, referring to a drawing about 
the gestalt of operation of this invention. 

[0014] The selection structure-ol^a-system Fig. of a primer base sequence in which 
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drawing 1 shows the exampl^W this invention, and drawing 2^ re the outline operation flow 
charts of the selection system of the primer base sequence. 

[001 5] As shown in drawing 1 , the selection system of the primer base sequence of this 
invention The processor (CPU/memory) 10 which executes an instruction, and the input 
unit 1 which inputs data, It has the store which memorizes the data inputted from this input 
device 1, and the output unit 2 which outputs data. That processor 10 Complementary 
location count of the input-process section 3 of an EST (manifestation gene) database and 
a genome base sequence, the new database creation processing section 4 by data 
processing based on an EST (manifestation gene) database genome base sequence, and an 
each entry and a short base sequence, It consists of complementary location database 
creation and the sorting application section 5, the primer candidate base sequence output 
section 6, the genetic algorithm processing section 7, and the optimal primer candidate 
base sequence group (put together) output section 8. 

[0016] And the selection procedure of a primer base sequence performs the input of (1) 
EST (manifestation gene) database and a genome base sequence, as shown in drawing 2 
(step S1). (2) New database creation processing by data processing based on the EST 
(manifestation gene) database and genome base sequence is performed (step S2). (3) 
Complementary location count of each entry and a short base sequence, complementary 
location database creation, and sorting application are performed (step S3). (4) Perform a 
primer candidate base sequence output (step S4), and (5) primer candidate base sequence 
data are acquired. Then, in the (6) step S5 which confirms whether make it termination 
(step S5), in YES, output primer candidate base sequence data, and it ends, in step S5, in 
NO, it progresses to the following step, genetic algorithm processing is performed (step S6), 
and (7) optimal primer candidate base sequence group (put together) is outputted to it — it 
is made like (step S7). 

[0017] The above-mentioned step S2 in drawing 2 - step S4 are shown in a detail, as 
shown in drawing 3 . 

[0018] ** Set for an EST (manifestation gene) database and each entry, and write out the 
array for 3 -end 1000 base from what searches the keyword of a "3'-end" and agrees to 
another file. That is, by the differential displaying method, the array for 1 000 bases is 
chosen from a 3'-end Poly-A array detectable with sufficient repeatability, and it writes out 
from all the cDNA base sequences of an EST (manifestation gene) database to another file 
(step S11). 

[0019] ** Next, it is considered that precision falls, so that it becomes back from a 
base-sequence-determination start point before and behind 500 bases, even if a single EST 
array is long. Moreover, on the property of a present-day DNA sequencing technique, since 
Base G (guanine) is indicated to be X (decision impossible) in many cases, this is all 
amended with G (step S12). 

[0020] ** Next, confirm whether there are some which are not filled into 1000 bases with 
the amended EST array a short thing and here (step S13). 

[0021] ** In order that a short thing may measure and combine a complementary location 
from a genome database in the EST array subsequently amended in this way, perform a 
homology (similar) search (step S14). 

[0022] In this case, as shown in drawing 4 , when the base which are not two or more 
complementation among 25 base sequences ahead of [ of an EST array ] base sequence 
determination appears, it repeats shifting the homology search starting position of a 
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genome and searching again.^^e optimization for finding theHrget heavy location can 
perform a homology search at a high speed very much compared with the usual search 
algorithm by the base only with these four kinds. 

[0023] The sense of 5 - to 3 - is changed, and if required in the part except the intron 
under genome array (field omitted after an imprint), it will change for the array in a part for 
1000 bases, and the above-mentioned file, and will write out from the location which carried 
out complementary coincidence. A part for the 1000 bases is considered as one entry. It is 
confirmed again whether, in that case, it is in agreement with an EST array. Moreover, 
about the EST array which does not carry out complementary coincidence, it does not 
carry out changing. 

[0024] ** By this, as shown in drawing 8 , the new file for databases which specialized in 
the differential displaying method can be built (step S15). 

[0025] ** next — if it confirms whether the beginning to the new file for databases was 
completed and has ended about no entries — step S1 1 — return — if it has ended, it will 
progress to the following step (step S16). 

[0026] The following primer candidate base sequence retrieval is performed using this file 
for new databases. 

[0027] A thing including more T arrays which do not include convenient conditions, i.e., 
palindrome prohibition, and T array which continued three or more in actually using it for an 
experiment by the differential displaying method chooses whether it is the number base 
(they are six bases or eight bases at a default) which carried out conditioning called 
exclusion, and is taken as the base sequence for performing primer matching. 
[0028] ** Perform a homology (similar) search for how many times it comes out and a part 
for the sum total and what entry the inside of the short base sequence of the n-th power 
individual of 4 is among the above-mentioned file for new databases as a round robin. 
[0029] It can refer to this homology search at a high speed very much rather than usual for 
trying to change the retrieval location of an entry, if one base is not the complementation, 
either, and the program which specialized in the short array. Moreover, whether one number 
base sequence carried out the complementation in which location under which entry saves 
further as a complementary database at another file (step S1 7). 
[0030] ** complementary [ among the number base sequences investigated for 
above-mentioned round robin ] — sort the bottom in an order from what has many entries 
(step S18). 

[0031] ** Next, write out to a primer candidate file as a candidate of a primer base 

sequence (step S19). This can be seen directly and a primer can be chosen. 

[0032] (10) next — if it checks and writes whether the beginning processing to a primer 

candidate file was completed and appearance is not completed — step S17 — return — if 

it wrote and appearance is completed — and — ** — carry out (step S20). 

[0033] When it is the experiment which does not need to select the following primer groups, 

it is also possible to choose a primer base sequence from this file suitably. 

[0034] Next, step S5 and step S6 which were shown in drawing 2 are explained to a detail. 

That is, the combination of the primer which has more entry gene expression confirmed by 

the genetic algorithm using the above-mentioned primer candidate file and a 

complementary database is chosen. When this combination is that [ number (base) ****** 

(step S16 reference) ] of 4 (base), and combines as a round robin and the original primer 

candidate array makes it calculate, astronomical count power is needed. By using a genetic 
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algorithm for this, the speed ^ffich chooses the optimal prim^Jroup increases by leaps 
and bounds, and becomes computable also by the comparatively small-scale computer. 
[0035] Drawing 5 and drawing 6 are the explanatory views about the output method of the 
combination of the primer which can confirm much entry gene expression from genetic 
algorithm processing. 

[0036] as a concrete algorithm — (1) — first, it entry-evaluation-array-creates and 
initializes (step S21). 

[0037] At this time, as shown in drawing 7 , the number array of entries which first set to 1 

1 bit corresponding to the entry number which made 1 bit every primer candidate base 

sequence one entry, and carried out the complementation, and set except [ its ] to 0 is 

made to the above-mentioned complementation database creation time. 

[0038] This is for compressing a storage region and there is no need of adhering to making 

one entry into 1 bit. The direction calculated by 1 byte can accelerate depending on an 

algorithm. 

[0039] Moreover, the combination of a primer candidate base sequence some (a default 8), 
i.e., the row of a two-dimensional array, is chosen from dozens 100 (a default is 100) or 
more with the random number. The thing which had the same array in the same 
combination (row), and the thing of the same combination except. (They are dozens about 
"combination", such as "AGAAAA CTGGAA CATTAA") . 

[0040] (2) Next, the two-dimensional array for genetic algorithms (refer to drawing 7 ) 
calculates to what entry the complementation is carried out about each line (step S22). 
That is, the number of evaluations of to what entry to have carried out the 
complementation is calculated. An OR operation is applied to the number array of entries 
created above about each primer candidate base sequence in combination, and a bit counts 
the number of 1. That is, a primer candidate base sequence group calculates under how 
many entries it is concretely in agreement. 

[0041] Since it is an OR operation, it is high-speed, and since it shifts also about the 
number array of entries and a number is only counted, it can calculate at a high speed. In a 
genetic algorithm, since the part which spends time amount most calculates the number of 
evaluations, the meaning which accelerates this is large. 

[0042] (3) Next, rearrange each line from a number (the number of evaluations) of high 
orders which carried out the complementation (step S23). 

[0043] (4) Next, perform the recombination operation of a genetic algorithm. A primer 
candidate group is first rearranged from the high order of the number of evaluations. Since 
this uses a hash, it is high-speed. About the number high orders 1/4 of evaluations of a 
primer group, it places as it is. Moreover, it attaches by 1/2 from the high orders 1/4 of the 
number of evaluations of two-dimensional array, and one of each primer candidate array of 
the combination which it was random and was chosen from high orders 1/4, and the 
combination chosen from another high orders 1/4 is chosen by random numbers. The 
combination of low order 1/2 chooses a base sequence candidate by random numbers 
completely. Completely, a thing with the same primer candidate array and the same 
combination are excepted, and are newly chosen by random numbers (step S24). In 
addition, the primer candidate array file is shown in drawing 8 . 

[0044] (5) Next, repeat from step S22 to the step S24 hundreds times or more (a default is 
200 times) (step S25). Thereby, the optimal primer candidate base sequence group can be 
selected. 
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[0045] (6) Next, if step S25 i^?atisfied, the optimal primer caWndate base sequence group 
will be outputted in order of a high order (step S26). In addition, the optimal primer 
candidate array group file is shown in drawing 9 . 

[0046] In addition, this invention is not limited to the above-mentioned example, and based 
on the meaning of this invention, many deformation is possible for it and it does not 
eliminate these from the range of this invention. 
[0047] 

[Effect of the Invention] As mentioned above, according to this invention, the following 
effectiveness can be done so as explained to the detail. 

[0048] (A) The concrete base sequence considered to be the the best for a primer can be 
outputted sequentially from a high order. Although it is thought by different species that 
the base sequences of the 3'-end of a gene cDNA base sequence differ delicately, the 
optimal primer can be obtained by comparatively short time amount by calculating in the 
kind of cDNA database. 

[0049] (B) If the differential displaying method is performed using the primer group chosen 
by the genetic algorithm, it is several [ only ] primers and a theory top is possible for 
investigating the existence of a manifestation of almost all cDNA(s). It is actually Yeast 
Saccharomices. When the manifestation gene database of Cereviciae was used and the 
program of this invention was performed, it turned out also by the single base sequence 
that there are many entries and coincidence parts. But and it is predicted among all the 
EST (manifestation gene) database of yeast that it combines with 70% or more of the entry 
the 3'-end was indicated to be. Generally it is thought about ten% rather than the number 
of entries which is in agreement by the primer of a commercial rule of thumb that there is 
much this. 

[0050] (C) If the combination of the best primer group obtained by this system is used, in 
the case of yeast, it will be thought in the combination of eight kinds of primers of six base 
sequences that 92% or more of manifestation of the entry in [ all ] EST (manifestation 
gene) can be investigated. This is very cheap and simpler than the approach of performing 
PCR using the primer of the conventional immense amount (thousands), and investigating 
by the DNA array. 

[0051] Current and the genome project of various kinds are advancing. Since this system 
can be performed to the database created newly in the future, it should become the 
functional analysis of a gene and an aid of new gene acquisition by the differential 
displaying method. 
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DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 11 It is the selection structure-of-a-system Fig. of a primer base sequence 
showing the example of this invention. 

[Drawing 2] It is the selection flow chart of the outline primer base sequence of this 
invention. 

[Drawing 3] It is the selection flow chart of the primer base sequence which shows the 
example of this invention. 

[Drawing 41 It is the explanatory view of the homology search in selection of the primer 
base sequence which shows the example of this invention. 

[Drawing 51 It is the flow chart of the genetic algorithm in selection of the primer base 
sequence which shows the example of this invention. 

[Drawing 61 It is the explanatory view of the genetic algorithm in selection of the primer 
base sequence which shows the example of this invention. 

[Drawing 71 It is the explanatory view of the genetic algorithm processed-data structure in 
selection of the primer base sequence which shows the example of this invention. 
[Drawing 81 It is drawing showing the example of a primer candidate file in selection of the 
primer base sequence which shows the example of this invention. 
[Drawing 91 It is drawing showing the example of an optimal primer candidate base 
sequence group file in selection of the primer base sequence which shows the example of 
this invention. 
[Description of Notations] 

1 Input Unit 

2 Output Unit 

3 Input-Process Section of EST (Manifestation Gene) Database and Genome Base 
Sequence 

4 New Database Creation Processing Section 

5 Complementary Location Count of Each Entry and Short Base Sequence, Complementary 
Location Database Creation, and Sorting Application Section 

6 Primer Candidate Base Sequence Output Section 

7 Genetic Algorithm Processing Section 

8 Optimal Primer Candidate Base Sequence Group (Put Together) Output Section 
10 Processor (CPU/Memory) 
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s*aar*aax5*7HiB«MeMsaa»£* 
Bar * r £ * tttt t r ^ X5 < v-asEsioaaa 

So 

[0001] 

T'$5T-f77l/>->y/l/ - T-fX7U-r^ <fcy*J 
<iiKllc<SfflT*3r3<i:5lc-ta, aeMT'yl/zruX^lc.fc 

[0002] 



tf$8T*£*„ D N A±0Jie ; F«*iB5'J« R N A ICE? 

[0003] 2^dn AaaEaigtasaicj:?-?* 

-WZ>a£1flS*i4§fcftTC^XV AD N Att*E8k 
$ fcBff (c£9E LTloftae? ©iMSE^Jx-* # * 

[0004] z.ti*><» : 7 : - / z<r>mw$>fflti\<J)Tcmz. □ 

7-/U /W*Q5><f- 

computa t i ona I molecular 
biology). £ 3 1/* li 7* 7 x-< y 
*X (£4MMH4¥ rbioinformatics) 

[0005] *fe #?*»*©s»?» aeraa* 

^•S. *<D-0<hLT^££<m3ftTl^5<7)tfPCR 

?-®-iT*S57<77 l/>->t)l/ • 7<X7H' (D 
ifferential Display) j£liv mR 
N AtfeUfffiELfcc DN A (») fc, <fcy£<<DcDN 

At^-rsiSton^^SEjij^jtofc^'r^- 

3' -3fcig<DPo I y-AfB5iJlc»l5-r«X5l'T 
— £TP C R HS&£T*;& »J , fco 

[0 0 0 6] 

[^6W3iLJ:5i:-r^iSM] Lft-L^S. f-f7 

fts ZtltT-m-OD N A^Sffi5iJ<t U P C Riffl©7 
^-f^-^a^-rsXP^AJiS-p/cAv J:U^<© 

[0 0 0 7] *Wm. ±IB©B3M*^<R3JU f-f7 
7l/Vi/fJl/ • 7^XXUY£T^I«il*&l£<J:y$<8) 
^WJctbRL. «r*iae : flSl»<D{i**S«)5fi:«)v fi 

as x^ ? -^SE^jroa^Tj ^sif * <dsb*ji# 

[0008] 



(3) 
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(b) SuiB (a) isiciti^enfc^-r^-iefiifr 

&o \Z L/c *,©?£*. 
[0009] C2) ±IB (1) iB&©73YV-igSiE 
$U®aje*S6fcfe^T» «E»«ie?T-r S"^-7it 
8E9J©5Sl}ft±s «SBBe?x— 0 0 
0 ^Slc^fc^t^^lc/ / AtgfiE5U<£:©<fc^n 5*- 

[0 0 10] C3] ±E (1) fB&©73-f7-it»K 
5!l©»E**tefcl*T* B5fBt§«13$U©SIg©5&£fcJ\ 

[00 11] C4) t&**Slffr*^P-fe'y9-t. t- 

£y/A£SiE9J<h£A^T3A;ttttl3IS15<h. £©A*1 
iQSSBfre ©r— 5i (C*^Tr-5« ©JDIfcfrl^T L 
i%T-*^-Xffc£ff3Srx-$^-X<kfflSffi<fc> & 

x> h g-ts*SE5"j©«ffifaiiitiiiffi*fffiiix- 

©i»5W7/Hf M XAffiJ]ISMFS©7~ S» fc»"3^T« 

ftffttSE9J&ai*l8£*Jl<ir«<l: 3lc Lfcfc©T-;B 

[0 0 12] ±IB<D«t5»!:«fi£LftOT% CftST'fflli 
T'25o/c7 r -i'77' U>->-WU • xV77U-ncga&7 

[00 13] 

[0014] si \t*m}<om&mzmrzf i 7<fv-m 

*«B5"J©»£->Xxi*©*>MM*7 t-H*S 

[0 0 15] HUCSfJ:3tc» *»»D75-f?-tt 
81B9J©S^->XxAtt. ft*S£fT?3«031gB (C 
PU/^t'J) 10t, 7-*£A73-f-5A*)gB1 
<!:. C©Artg^1 frSAfl^tifcx-SfclBirfSfB 



fflSSHI 0ti\ EST r-^-Xt 

yy^«iB5U©A73«iasP3, est rue?) 

7-*^-X • y/A*WBM[c«-3Wi:x-*l]0Ifc 

ge5y©ffl*nasftfiv aattBr— ^-xfcaifv- 

[0 0 16] -?-LT> 75-fT-*«B5"J©a*#m 
lis H2lc5*T«fc3fc, (D EST (j^ffiifi?) t 

D , (2) *©est asase?) t— Sf^-Xt 

yy ix^»BB5iJ KS^/c x— 9 HOIK <fc *«f 5 s - r ? * 
-XffcttWSfrV Uf»?S2) , (3) £l>HJ 
-&fiJtSE9J®JSffl&»1-ir> ffiSffilf-: 5* ^-7. 
Ik&tfV- hWHrfil* Uf77S3) » (4) 73 

(5) ^-T^-BWftMBJlHH^-^SWSU *n 

(6) Xxy7S5lCfe^TYES©li^fCtt7 r 5-1'? 
5(Cfc^TNO©lf^Clis ^©Xx^'vii** iHi 

»7;u=rux^a«ffc> Ufy^se) » (7) » 

Ur'>7S7) «t3lcLT^ 0 

[0 0 17] m2lZtSlt2,±=&XTv7S2~X9-vZr 
S4ti. gfiHiai. 03U:5Vt<i:3lc, m^tiZo 

[0 0 1 8] <DE ST (B9|9e?) 7-^^-7., «• 
IVhU-lCfc^T, r3' -jfcSgj ©*-7-K£8 
SmtZ&CDfr^ 3' 0 0 0^S»©SB 

w*ffl77-f^wc»*air. -otv. est oms 

?) x-^/^-x©, T^T©c DNAH«iE9iJfr6* 
<fctBT*$£3' -5fcS£Po I y-AEHJ:U1 000 

a*#©E*i*aftu ai77"riWc»*mr (x?* 

7S 1 1) . 

[0 0 19] ©*lc. *-©ESTSB9Jtt. S<Tt.5 

o o ttxrauakKiittjftfreft&icftss 

IB9J^fi«©1t«±. ^gG oae 

ffiiEr* (AxyT'S 12). 

[0 0 2 0] <3>$i\Z % MEL/cEST@B9JT'St"t©s 
CCT'»i> 1 0 0 0W&K.mrc1tV't,<MftoZ&§tP* 
fiyvtS Ury^S 1 3) . 

[002 1] QBkWs Z\<D& 3 leffi IE Lfc E S TE9J 

S3 Uf-^S 1 4) „ 



(4) 
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[0 0 2 2] r.©i§£, H4fcgft\J:5lc % ESTBH9J 

o)t&mm\7kfcmjjo 2 5 &SE9j* 2 ®rftVL±o>ft® 
T*^**fl^tirca^s *v a©** n*j-v-*n 

©^x-r^xtcty, a*©-*- x7;uzryxAtc 

tt<T#1IHC»51lC* J EP^— y-x*ff 5 Z t&T-Z 
3. 

[0 0 2 3] lffll|-aLfcttlKr5, V/U6Bm<»4 

5. «Sf65«, 5' -#6 3' -©fl**a»LT» 
1 0 0 0HS#, ±E77*-f;l/«t>©E$!l£Aft9;lT* 

*mr. *©i ooo^s^ixvhu-i-rs. * 

©Rfc, ESTE9J&-HRLTlvBfrHg?xy9T 
«. */t««-«Lft^ESTB5iJlc-3^Ttt» Aft# 

[0 0 2 4] ©CftlCfcoT* eiixtf. ni8tcm-r<fc5 
ic, xV77l/>$>*JU« rVXTWSlc^fbLfc, 

S1 5) . 

[0 0 2 5] <B$UZs ^TOXVMJ-tC-^TWfcfc 

■»J» H7LTl*W£ &©X7 1 v7'\iI6 Ufv7 
S1 6) . 

[0 0 2 6] JKT©^5-fV-«MttSK5>jaSRtt. E 
©Sr x- * ^-7ffl 7 7"T /U£ ffiffl L Tff 5 , 
[0 0 2 7] tV77»UV->*;U« x-fX^U-f^T'* 

y *^TE5U**A,f£*»©ttl**k tWt&ffit* L 

[0028] ©4©niiffl©ffi«SE5ij©*^ man 

ifilxyMJ ffi§ , JT'*inv f - a® 
[0 0 2 9] C©*=ECI-7-tf- *T'tt* 18l7t.fi 

fittx-^-XiiLT, SSlc»j73*-fJUc«SLT 
£< Ut77S 1 7) . 

[0 0 3 0] ®±EI83fcUTWfelHI>E*l©3 
* (T.x-yT'S 1 8) . 

[003 1] ©ale, 75-f 7-HgE9J©fl£*S<!: LT 



9) „ CftSBlSjaTr5'fV-*»Rr*Ctij«-p* 

[0 0 3 2] (10)&K, 7^-r^-<gffi7y"r;U'\©# 

^7LTl^6^/cS7.x->7'S 1 7'\M I A S^dB*^ 
H7LTl*fcSxvK£f* Ux';/7S2 0) . 

[0033] ttT©^7'T7-ff«auer%ie«ii ( «^ 

JUfc©«£« C©77'f;Utl k 675-r7-*»E5iJ*51 

[0 0 3 4] E2lC/XxLfc7x->7S 5fc<i:lfX 

7;UJyXXxlcJ;oT. ±1275^-^7 7"fM/<h 
fflffix-^-xsiijfflLT* «ty*<©x>hy-s 
e?©»R*«4i»«6 6 ft* 75 -r 7-©«*£to**a 
,y„ E©fsa^b-e-w\ 7c©75i'7-<g^E9j3bM 
mm ©a am §i£* (xx>>7s 1 6#bd © 

/{■7-tf«2ssic&*. cti«aew7;i/dryx^*f"jffl 

?«E£(;:J:y. KgT^-f^-SSaSlTSXb?- K 
tfHWttli:±tf »J % itRW'J^«©H-IWlT-t,H-ltRl«g 

[0 0 3 5] mSJZXf®6ltmis.<*l7)\>3*)XL>y!mt 
£ y £ < ©x y h y ->i£7©&gi£5Sfr 466 ft* 75 

[0 0 3 6] Jitttta7jl/3yXi»£LT« 

( 1 ) sr. x> h y -mHESHfts • suiMfc-rs a 

T7 7S2 1) . 

[0 0 3 7] £©Bf. 07fcaVrJ:5fc, $?\ &75 
<7HI»!ttHEaiS£» 1 0©x> h U -« 1 K» h 
i: LffllLfti> h U-**K»J6r« 1 tf y h*1 (c 
U *ftJ.Xfl-£ 0 K Lfcx > h y -RE?J*±Effl»r 
- * "C- XttfiKBS icflFo T fc < . 

[0 0 3 8] CfttilBH^ii£E*rf.5/'c4&T\ 1 ^© 

IS 1/tf hTttWLftftft 7ibJyXi»[c«fc-3Ttt 

[0039] a»ic*y» 75-r7-<ga^sE 

7cE5U©«5U*»+*6B«± (x7*;Uhtt1 0 0) 
SA,T*fc<. nUii*^*)* GM9J) T% l?l-©E9J5 

( "AGAAAA CTGGAA C A T T A A" H© 

[0040] ( 2 ) jmc» iteW7';u J 'J X Affl 2 
E9J (07 #80 «5K-3^Tfflx>hy-i:«i 
f*©36Mt»*?f5 (Xx>v7S2 2) . -3*y, {5JX 

>hu-t«*Lrfc<rfl!)m»*inw*. sa^t>H* 

x> K 'J -CkE9JlcO R iSW^AHt, h*^ 1 ©K« 



(5) 
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a**. ^sy, 7?4*-mfflSB9mi*. Attn 

t0 04 1] OR-S»&OT*if>lT*35»A IVh'J-S 

mux-**. asw7/i/rr'jXAT'tt, mn^tfirr 

[0042] o) a«Lfca (iwaw o±tt 
[0043] ( 4 ) aew7;u=r y x^<o«aw. 

H<o $/c23>!7cE5"J©l?«<D±(ai/4A^1/2 
^T-lco^Ttt. ±ffi1 /4fr65V^ATjSA,£m 
S'J<D±fit1 / 4fiSIA,»^M(DS7 

©ffl^Sfetftt, ££(3MK?t»E9J1Ktt*S£. ID 
-©75Y7-fl£*fE?J£8?/i:fc©. ££&n 

7S2 4) o 08K7 , 5f VH£1fE9iJ77"nU 

[0044] (5) MZ. 7.x-y7S2 2fr67.7 1 -y7 
S2 4$T^»W0J-X± (7 r 7*;UHi2 0 0lHl) *>J 

[0045] (6) Xx-y7S2 5^2ISL/'c 

[00461 ^ *«Wtt±ratJMUKHS*ft*t, 

T'« U » £ft5£*£iB©15H# 6 Bf BW * *,©?«:£ 
t\ 
[0047] 

[£B£©8]3I] «±» l¥filcSi^Lfc<fc5tc, 

[0 0 4 8] (A) 79H'7-lCttid:»Ml«J|{m 

^aT'ttite? c D N A «SE$I JO 3 ' -*S©m8IB 
9Jlift&(;:JI&«&«%Stl«#, *©9© cDNAr 

[0 0 4 9] (B) SettTM/JUXikTStftlfcX? 

TCcDNA ©831©*r*I£!li'<5 d <!: )b\ Sit±pl^ 
T*$5. HfSfc. SSS accharomices C 
e r e v i c i a e ©^JSilfE?^— S"<-X£<£fflL 



E9J"Pt», £<©i>HJ-<h-S<Iffitf£3C£tf# 
A^fto toi:t^t»©T% i¥$©£EST (HSlii 
fig?) x-^-X*, 3' -*^iHK3-tl/ci>h 

v-0)7 ovomttg-stzt^m-tftz* cw*ime 
<Dimnw7-<c v-T—a-r^i v h y y t* 

[0 0 5 0] (C) *S/7xAT}f6tlfcg±©75-l' 
E5iJ©75'r'^-8aS©$i*-^to-a-T% tEST (5g 

sac?) *©x > t- y -© 9 2 %«±«9Hi«n^ e 

T^f^-fcffl^T PCR£*rl,\ DNAT'U-TUS^ 

[0 05 1] * £#$&a©y/i*7P-7i7 h 

ftfc^-^-XlottLTglST**©-^ 7^771/ 
>->+;U • 7^77U^l;:«fc3fie?©$H£fiS*/T<!:Sr 

$fie?sK#©-8&£&*tt-fT*a&5. 

[0iii©fflJ|ifcl»B] 

i® 1 ] *fm©siikffi«^r 75 >r ^-ttttewoai 

[02] *8fB©««7'5-l'-?-EgE$U©S;£7P- 

[03] *«W©Safi«*/T»f 7*5-T7-lMIE5iJfl!)a 
£7P-* + -hT*&5o 

[H4] *»w«)ittt«!*sr7*5-r?-*i»E9j©a 

[E5] *»wffl«tt«!*3%rr5-f?-wii»jflwi 
su:fi^*3iett7';ujyxi*©7P-^v - h-p* 

So 

[06] 44M0*M*ttr7?H'?HiaEnoa 

[0 7 ] 44M0SHH!«OTr 75 -f 7-ttM9yeS 

[0 8 ] *nw©n«6«««^-r 75 -r -v-mgrnvicoM 

[09] *»H©SBfcfli*S'<-X5-r7-«»B»>J©a 

5Vf0T*££. 
[^©RB^] 

1 A7DgM 

2 ft*)£B 

3 EST («3Ute?) x-^^-Xty/AffiS 
E9J©A7j5aSSB 

4 Wrx-^^-X^SSB 

5 sxvhy-isiiSE9ij©*ifiHfisn-». ism 

6 X5'T7-lg«ffiSE9JIJiai73gP 

7 ae»7';ujuxik«Ha» 



(6) 
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8 mmy^j^-mi&mmm tts 

73 SB 



1 0 MUISH (CPU/^tU) 



C=3 



0 



T 



I 



T 



I 



I 



io : &s 

(CPU/^*U) 



0 CD 



[06] 



[ATGCGG] 






[GCTCGA] 












[CTGACC] 




[ATCGGT] 


[ACACAC] 


> 


f 














[ATGCGG] 


[CTGACC] 


[GCTCGA] 


[ATCGGT] 


[ACACAC] 




(a) 



(b) 



(c) 



(7) 
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[1212] 



o-o 



S 3 









S 1 


est (mmk&T) 7-?<-xt 










S2 




















S 4 













WTt&fr'L^-^'^ YES 



S 6 



s 7 



(8) 



^2 0 0 0-3 08 48 7 



M3] 



Sll EST (f£Sli!£?) x-^K-X, HJ-tCfcl^T, 
-50S1 0 0 0*ig#©ffi?!|£> 8j7r-f^C**Hit 



S12 



S13 




S14 



S15 



If f - * * - 7 ;HM«* 




S17 



S18 



S19 



S20 



I 



I 



7*5 Y v-&ffi7 r -C Hit 






ftlH2 0 00-3 08 48 7 



[04] 

(a) yy A&gE7!l£ESTx> hU-tO**a^— !f-f- 

ATCGTACCTCCCTAGCTAGTCGGCTAGCTACCTCGCTCTCTGGGCTGCGCT A^SEJiJ] 



ATCGTAGCGCGCGATCGTAGCCCAACACGATGCCGATGCATCGTACATCGGT tx>HJ-0>D 



GAGGTACGTCGC (-3' fig) 



GCACGA 



t#g|2 0 0 0-3 08 4 8 7 



S24 



[05] 

(ZED 



S21 



S22 



S23 



> b y - t mm* z frnnrnm 



3. 
4. 



2&7cIB?iJ©i:&l/4 (a) Ic-o^Xit, *<n£tm<, 
±<ai/4^S>l/2 (b) ±firi/4^fc.7> 

T& i / 2 ©«a*-&*3-fr (c) a, ^ic5L»r-a^o 




S26 





YES 









(11) 



8g§2 0 0 0-3 08487 



[H7] 



x> h U-IWa^iJ (array) 
i> h U — # 0 

oooooo 

t 1 b'v h 



i>HJ-#n 
0 



'&miT)l>3 U XAffl 2 ^TcEyiJ (array) 

S^iJ (77^v-^j£KiB?J©ffi*£fcir) 
<= » 

AGCGCT CGAGCA AAGTTA CGCGTT ATGCTG AAACCC TATCTG CGCCGT 

GCTAGC GGAGGC TAGCCA CACAGA ATGTGG CAACAA GAGTAC CGAACT 

ACCTGC CGAGGA 

AACGTC 



(100ft£t±) 



(12) 
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[H8] 

Primer [ 6] mer. Total of sequences: 2798 
Mean of all sequences: 331.5797 . 

No. Sequence: How many matched (Seq. covered): Mean/Std dev. of loc. : 



0 


AAAAAA 


2422 ( 


968) 


241.6841 / 


102. 9200 


4095 


TTTTTT 


2186 ( 


836) 


239. 3683 / 


122.7736 


819 


TATATA 


2032 ( 1077) 


266. 9060 / 


88. 3254 


3276 


ATATAT 


1930 ( 1003) 


258.4896 / 


92. 9694 


3072 


AAAAAT 


1459 ( 1101) 


230.6381 / 


105.5594 


4030 


CTTCTT 


1360 ( 


847) 


163.0235 / 


9B.4548 


768 


AAAATA 


1287 ( 


980) 


235. 7964 / 


103. 6617 


48 


AATAAA 


1254 ( 


830) 


236. 5343 / 


105.0341 


12 


ATAAAA 


1240 ( 


941) 


243.2855 / 


102. 1883 


3 


TAAAAA 


1225 ( 


934) 


237. 8808 / 


105.4180 


192 


AAATAA 


1220 ( 


884) 


237. 8787 / 


105. 0386 


4031 


TTTCTT 


1183 ( 


882) 


184.0B96 / 


104. 0615 


64 


AAAGAA 


1167 ( 


885) 


219.8955 / 


101.7196 


4079 


TTCTTT 


1158 ( 


890) 


180.6209 / 


104. 9007 


3055 


TTCTTC 


1155 ( 


771) 


160.4450 / 


96. 4877 


16 


AAQAAA 


1152 ( 


917) 


216.8681 / 


103.5133 


3840 


AAAATT 


1092 ( 


869) 


217.0421 / 


110.0152 


4047 


TTATTT 


1087 ( 


856) 


242.9724 / 


107. 6030 


1024 


AAAAAG 


1079 ( 


850) 


223.3411 / 


108. 7999 


4092 


ATTTTT 


1050 ( 


859) 


211.7924 / 


116. 4093 


3264 


AAATAT 


1040 ( 


861) 


238.1173 / 


102.2256 


2047 


TTTTTG 


1032 ( 


847) 


196.0669 / 


113. 7874 


256 


AAAAGA 


1012 ( 


817) 


228. 4733 / 


105. 6524 


4 


AGAAAA 


999 ( 


827) 


224. 8398 / 


100. 7646 




(13) 
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Primer [ 6] mer. Total of sequences: 2364 
Combination: How many seq. covered: 



TATATA 


TCCAAA 


TTCCTT 


TTTTTT 


AAGAAG TTTGAA 


TTATTT 


TTCAAA : 


2359 


ATTTAT 


CTTCTT 


GAAGAA 


TTTTTT 


TATATA AAAATT 


TTATTT 


TTCAAA 


2358 


ATTTAT 


TCCAAA 


TTCCTT 


tttttt 


AAGAAG TTTGAA 


TTATTT 


TTCAAA 


2358 


ATTTAT 


TCCAAA 


TTCCTT 


mm 


TATATA GAAAGA 


TTGGAA 


TTCAAA 


2358 


AATTTT 


GAAGAA 


ACAAAG 


TTTAAT 


TATATA AAAATT 


TTAAAT 


TTCAAA 


2358 


TATATA 


TCAAAA 


TTCCTT 


TTTTTT 


AAGAAG TTTGAA 


TTATTT 


TTCAAA 


2357 


ATTTAT 


CTTCTT 


AATTTT 


TTTTTT 


AAAATT GAAAGA 


TTTTCA 


TTCAAA 


2357 


TATATA 


TCCAAA 


AATTTT 


TTTTTT 


TGATTT TTTGAA 


TTATTT 


TTCAAA 


2357 


AAAATG 


CTTCTT 


TTATTT 


TTTTTT 


ATTGAA AAAATT 


CAACAA 


TTTGAA 


2357 


AAAATQ 


CTTCTT 


GAAGAA 


mm 


TATATA AAAATT 


TTATTT 


TTCAAA 


2357 


AAAATQ 


TCCAAA 


TTATTT 


TTTTTT 


ATTGAA AAAATT 


CAACAA 


TTTGAA 


2357 


TATATA 


TCCAAA 


TTTATT 


TTTTTT 


TTTGGT TTTGAA 


TCAAAT 


TCAAAA 


. 2357 


TTCTTT 


AAAATT 


AAATCA 


TTTTTT 


TATATA GAAAGA 


TTTTCA 


TTCAAA 


: 2357 


AAAATQ 


TCCAAA 


CTTTGG 


TTTTTT 


TATATA TAATTT 


TTATTT 


TTCAAA 


: 2357 


ATTTAT 


CTTCTT 


GAAGAA 


TTTTTT 


TATATA ATATTA 


TTCAAA 


TTTGAA 


: 2357 


TTTTTG 


TGAAGA 


AAATCA 


TTTTTT 


AGAATG AAAATT 


TTCTTG 


TTCAAA 


: 2357 


AAAATQ 


TCCAAA 


TTCCTT 


TTCTTT 


TATATA TTTGAA 


TTATTT 


TTCAAA 


: 2357 


AAAATG 


TCCAAA 


TTCAAA 


TTTTTT 


TATATA TTTGAA 


TTATTT 


TTCTTG 


: 2357 


AAAATG 


TCCAAA 


ATAAAG 


TTTTTT 


TATATA TTTGAA 


TTATTT 


TTCAAA 


: 2357 


ATTTAT 


TGAAGA 


AATTTT 


TTTTTT 


AAAATT GAAAGA 


TTCAAA 


TTCTTG 


: 2357 


ATTTAT 


CTTCTT 


GAAGAA 


TTTTTT 


TATATA TTTGAA 


TTATTT 


TTCAAA 


: 2357 


AAAATG 


TCCAAA 


TTTTTA 


TTTTTT 


TATATA TTTGAA 


TTATTT 


TTCAAA 


: 2357 


TTCTTT 


AAAATT 


AAATCA 


TTTTTT 


AAGAAT TATGCT 


AATTTT 


AGAAAG 


: 2357 


TATATA 


TCCAAA 


AATTTT 


TTTTTT 


AAAATG TTTGAA 


TTATTT 


TTCAAA 


: 2357 


ATTTAT 


AAAATT 


GAAGAA 


TTTTTT 


TATATA TTTGAA 


TTATTT 


TTCAAA 


: 2357 


ATTTAT 


CTTCTT 


GAAGAA 


TTTTTT 


TATATA TTTGAA 


TTATTT 


TTCAAA 


: 2357 



02)%w% mm m 



F£-M#t) 4B024 AA20 BA80 HA11 
4B063 QA13 QQ42 

5B075 PP22 PQ72 PR04 PR06 QM08 
UU19 
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